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Abstract 

We formulate simple equivalent conditions for the validity of Bayes' formula for con- 
ditional densities. We show that for any random variables X and Y (with values in 
arbitrary measurable spaces), the following are equivalent: 

1. X and Y have a joint density w.r.t. a product measure /i x z/, 

2. Px,Y <S Px X Py, (here P. denotes the distribution of •) 

3. X has a conditional density p{x \ y) w.r.t. a cr-finite measure /i, 

4. X has a conditional distribution Px\y such that Px\y ^ Px for all y, 

5. X has a conditional distribution Px\y and a marginal density p{x) w.r.t. a measure 
/X such that Px\y M for a-U U- 

Furthermore, given random variables X and Y with a conditional density p{y \ x) w.r.t. 
v and a marginal density p{x) w.r.t. /i, we show that Bayes' formula 

p(y I x)p{x) 



p{x I y) 



Jp{y j x)p{x)dfi{x) 



yields a conditional density p{x \ y) w.r.t. ^ if and only if X and Y satisfy the above 
conditions. Counterexamples illustrating the nontriviality of the results are given, and 
implications for sequential adaptive estimation are considered. 

AMS2000 subject classifications: 60A05; 60A10. 

1 Preliminaries 

Let (ri, J-, Pr) be a probability space. A random variable is a measurable mapping A : O — > X 
to some measurable space (X, X) (usually the real line R equipped with the Borel cr-algebra 
jB(R)). The distribution of the random variable is the measure Px '■ S ^ Pr(A^^(5)) induced 
on X. If Px{S) = JgPx{x)dfi{x) for all S £ X for some measurable function : X — J- [0, oo] 
and some measure /i : A" — >■ [0, oo], then px is called a density of X w.r.t. fi. For brevity, we 
leave out the subscript of the density when it matches the arguments, i.e, instead of px{x), 
we write simply p{x). 

We define the product ^ x v : X ® y \}), oa] oi arbitrary measures : A" — >• [0, oo] and 
I' : 3^ ^ [0, oo] by 

{oo oo ^ 

^/i(Afc)i.(Bfc) : {Ak]T=i C X, {Bu]T=i C 3^, 5 C y Afc X Bfe I , 
fe=i fc=i J 

where X ®y denotes the a-algebra generated by all measurable rectangles. 
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Theorem (Fubini-Tonelli). Suppose {X,X,ii) and (Y,3^, z/) are measure spaces and / : 
X X Y — > [—00, 00] is a measurable function. If either / is integrable or / is nonnegative with 
cr-finite support { (x, y) : f{x, y) ^ }, then 



f{x,y)d{n X iy)(x,y) = 



f{x,y)dfi{x) 



f{x,y)dv{y) 



dfi{x). 



Proof. Follows from (jMukherj e3,[i97i) 



□ 



If a pair of random variables (X, Y) : D, ^ X x Y has a joint density p{x, y) w.r.t. fj, x v, 
then we can apply Fubini's theorem to write the marginal distributions as 



PxiU) 
Py{V) 



Px,y{U X Y) 



p^,y(Xx y) 



p{x,y)dv{y) 
p(x,y)dfi{x) 



d^j.{x), 
dv{y), 



which implies that px{x) — J p{x, y)di'{y) and pviy) — J p{x, y)dfi{x) are marginal densities 
w.r.t. fi and v, respectively. 

A transition measure from (Y, y) to (X, X) is any function fi :Y x X ^ [0, 00] satisfying 
the following axioms: 

1. for every y £ Y, the function S 1— > fi{y, S) is a measure on X, 

2. for every S E X, the function y H> S) is 3^-measurable. 

The product of a transition measure fi :Y x X [0, 00] and a cr-finite measure v : y ~> [0, 00] 
is given by 

(/i X i^){S) := J n{y,Sy)du{y) 

for all S € X y, where Sy :— {x : {x, y) E S}. The product is a measure on X 1^ y. If a 
transition measure Px\y '■ Y x X [0, 00] satisfying 

Px,Y = Px\Y X Py 

exists, then it is called a conditional distribution of X given Y. We will also use the shorthand 
Px\y ■= PxiviUj ■)■ Note that a conditional distribution always exists for a random variable 
in (R",i3(R")), (R°°, B(R°°)), or any other complete separa ble metric space, but there are 
spaces where its existence is not guaranteed ( Shirvaevl 1 19961 ). 
If a conditional distribution Pxiy exists and satisfies 



PxiviS) = 1 p{x\ y)dfi{x) 
Js 



for all S € X, y £Y for some measurable nonnegative function {x,y) n- p{x \ y) and some 
measure /x, then p[x \ y) is called a conditional density of X given y. If a joint density p(x, y) 
exists w.r.t. ^ x v, then a conditional density can always be obtained by 



' P{x,y) 



p{x I y) 



p{y) > 0, 



p{y) 

0, p{y) = 0. 



(The value chosen for p{y) = is immaterial as the conditional density is only determined 
^ X Py-a.e.) 
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2 Regularity conditions for Bayesian estimation 



The following theorem gives a set of equivalent conditions under which we can avoid the 
potential problems of nonexistent distributions or densities. 

Theorem 1. Let {X, 1") : — ;> X x Y be a pair of random variables. Then, the following are 
equivalent: 

1. X and Y have a joint density w.r.t. a product measure fi x v, 

2. PxY ^PxxPy, 

3. X has a conditional density p{x \ y) w.r.t. a a-finite measure fi, 

4. X has a conditional distribution Px\y such that Px\y ^ Px for all y, 

5. X has a conditional distribution Px\y '"^'^ marginal density p{x) w.r.t. a measure fi 
such that Px\y ^ M fof o,li U- 

Obviously the same conditions with the roles of X and Y reversed are also equivalent. Fur- 
thermore, 

6. if the above conditions hold for X and Y , then they also hold for X' ~ F{X) and 
Y' — G{Y) where F . X ^ X' and G : Y — >■ Y' are any measurable functions. 

The conditions of the theorem are mild, being satisfied whenever either X or F is discrete 
as well as in most practical situations with continuous random variables. However, they 
preclude in particular the following example: 

Example 1. Suppose that X = Y ^ Uniforni[0, 1]. The conditional distribution Px\y{S) — 
[y e S] is singular w.r.t. Px — ™[o,i]i where TO[o,i] denotes the restriction of the Lebesgue 
measure to [0, 1], and so condition 4 of Theorem [1] is not satisfied. The conditional density 



p{x I y) = [x = y] := 



1, x^y, 
0, x^y 



exists w.r.t. the counting measure, but this measure is not cr-finite and so this density does 
not satisfy condition 3. Even though the joint distribution can be written as 



Px,y{S) = / [ [x^ y]d4{x 



where # is the counting measure, the integrand [x = y] does not yield the joint density of 
condition 1 because the function [x = y] is not integrable w.r.t. # x m[o.i] and so Fubini's 
theorem does not hold for the iterated integral. 

Example 2. One interpretation of the conditions of Theorem [T] is given by the fact that the 
Radon-Nikodym derivative in the measure-theoretic definition of mutual information 

dPx,Y 



KX;Y) = J dPx,Y log 



d{Px X Py) 



exists precise l y whe n Px,y ^ Px x Py (condition 2). In case Px,y is singular w.r.t. Px x Py, 
Kolmogorov ( 19561) defines 1{X;Y) — 00. Thus, failure of the conditions of Theorem [T] 



implies that observation of Y is expected to give an infinite amount of information about 
X (and, symmetrically, X is expected to give an infinite amount of information about Y). 
In Example [U observation of Y gives complete information about X and this information is 
obviously infinite (it would take an infinite number of bits on the average to transmit the 
precise value oi X ^ Uniform[0, 1]). On the other hand, if either X or F has only a finite 
number of possible values, then there is only a finite amount of information that can be gained 
about it; this implies 1{X;Y) < 00, and so condition 2 of Theorem[l]is necessarily satisfied. 
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2.1 Bayes' theorem 

The conditions of Theorem [1] are precisely those under which Bayes' theorem can be apphed 
to a conditional density: 

Theorem 2. Let {X, Y) : fl ^ X x Y be a pair of random variables and suppose that p{y \ x) 
is a conditional density ofY given X w.r.t. to a measure v. Then the following are equivalent: 

(a) X and Y satisfy the conditions of Theorem\^ 

(b) There exists a measurable subset y C Y such that the measure v' {B) :— v^B H V) is 
a-finite, p{y \ x) is a conditional density ofY given X w.r.t. v' , and 



P{y) / P{y I x)dPx{x) 

is a marginal density ofY w.r.t. v'^ 
(c) Bayes ' formula 

Px\y{S) 



IsPjy I x)dPx{x) 



Jpiy I x)dPxix) 
defines a conditional distribution of X given Y. 

(c') If p{x) is a marginal density of X w.r.t. a measure fi, then 

p{x I y) — I x)p{x) 

Jp{y I x)p{x)dfi{x) 

is a conditional density of X given Y w.r.t. /i. 

The following example shows that in some pathological cases, it is possible that (b) holds 
as stated above, but p{y) is not a density of Y w.r.t. the original measure v. 

Example 3. Let C C [0, 1] be a meagre set with positive Lebesgue measure (e.g., a fat 
Cantor set) and define 

S -.^ {{x,y) e[Q,l]^ -.x + y eC or x + y-1 eC} 

so that every section of is a cyclically shifted version of C. Let Px be the restriction of the 
Lebesgue measure to [0, 1], and define Py\x through the conditional density p{y \ x) ^ [y E 
Sx U {0}] w.r.t. the measure 



u{B) [0£B] + 



B meagre, 
otherwise. 



As the meagre sets form a cr-ideal, this definition indeed yields a countably additive measure. 
As every Sx is meagre, we obtain 



dPx{x) 



Px.y{R) = J j p{y\ x)dv{y) 

= J i^{RxniSxU{0}))dPx{x) 

= J[OeRx]dPx{x) 
= Px{{x:{x,0)eR}), 



^ li 1/ is semifinite (for every nonnuU B y there exists B' C B such that < fiB') < oo), then p{y) is a 
density of Y w.r.t. the original measure u, too. 
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which is a well-defined joint distribution (yielding {X, Y) uniformly distributed on [0, 1] x {0}) 
and satisfies the conditions of Theorem [TJ However, the function 

J [1, y = o, 

is not a density of Y w.r.t. v, because 

p{y)duiy) = 1 • i^m) + Px{C) ■ v{% 1]) = 1 + Px{C) -00 = 00. 

Nonetheless, in accordance with Theorem [JJb) , p{y) is a density w.r.t. the restriction of v to 
the a-finite set {0}. 

2.2 Adaptive sequential estimation 



In ad aptive sequential estimation (see, e.g.. iMacKavl 119921 : iKujala and Lukkal . 120061 : iKuiala 



2OIOI ). a random variable Q is estimated based on a sequence y^,^ , . . . , of independent 



(given 9) realizations from some conditional densities p{yxt \ Q) indexed by trial place- 
ments Xt, each of which can be adaptively chosen from some set Xt C X based on the 
outcomes {yxi, ■ ■ ■ ,yxt-i} of the earlier observations. The placement decision function d : 
{Uxn ■ ■ ■ jj/xt-i} '-^ Xt can be deterministic or random, and we also assume that there ex- 
ists a special placement value that signals the end of the experiment. Thus, the outcome 
Yd — {Yxi , • . • , Yxt } of a whole experiment governed by the decision function d can be con- 
sidered as a single random variable with a random number T of components. It is natural to 
ask the following question: under what conditions do the whole- experiment outcome Yd and 
Q satisfy the conditions of Theorem[l\? 

If Yx and 9 satisfy the conditions of Theorem [T] for all x, then one can apply Bayes' 
formula to any finite set y = {yxt }t=i of results sequentially: 

Pi9\y)^pi9)piyx, \9)---p{yx^ \9). 

This implies that -Pe|y ^ for all y (condition 4) and as this condition makes no reference 
to the distribution of y, it follows that regardless of the decision function d, the whole- 
experiment outcome variable Yd has a joint density with Q provided that the experiment 
terminates with probability one (so that y is almost surely finite). However, if there is a 
positive probability that the experiment does not terminate, then it is possible that no joint 
density of Q and Yd exists, even for constant placements: 

Example 4. Suppose that X ^ Uniform[0, 1] and the random variables Yf G {0, 1} for 
t = 1,2,... are defined as a binary representation of X. Then, although the conditional 
density p{x \ yi, . . . ,yT) w.r.t. the Lebesgue measure is well-defined for any finite set of 
observations, the full sequence of results Y :— {Yt}^i cannot have any joint density with X, 
because by condition 6 of Theorem [TJ that would imply that also the transformed variable 

00 

Y' := F{Y) -.^^2-% 
would have a joint density with X = Y' , which contradicts the negative result of Example[T] 
2.3 Proofs 

Proof of Theorem\^ 2 =^ 5: Using the joint density p dPxy / d{Px x Py), we obtain 
the induced marginal density p{x) w.r.t. the measure Px and the conditional density 
p{x I y), which induces a conditional distribution Px\y ^ Px- 
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4: Denoting N := {x £X: p{x) = }, we have 

= j^p{x)d,i{x) = Px{N) = j Px\y{N)dPY{y), 

which implies Px\y{N) = for Py-a.e. y. However, as Px\y is only determined for 
Py-a.e. y, we are free to modify it so that Px\y{N) = for all y. We will show that 
this Px\y is dominated by Px for all y. Let S & X he such that Px{S) = 0. Then, we 
have 



Q = Px{S\N) 



p{x)dii{x), 



>o 



4 
3 



which implies ii{S \ N) =0. As Px\y we have Px\y{S \ N) = 0, but as also 

Px\y{N) = 0, we obtain Px\y{S) = 0. Thus, Px\y < Px for all y. 

3: Choose = Px- 

1: By the definition of conditional density and Fubini's theorem, we have 



p{x I y)dii{x) 



dPviy) 



p{x I y)d{iJ, X PY){x,y). 



Thus, p{x I J/) is a joint density of X and Y w.r.t. Py- 

2: Suppose tliat p{x, y) is a joint density w.r.t. fi x v and let S* G A" ^ be an arbitrary 
measurable set such that {Px x Py){S) = 0. We will show that then Px,y{S) = 0. 
Denoting 



U 
V 
N 



{x& X:p{x) =0}, 

{y&y--p{y) = o}, 

([/ X Y) U (X X V), 



we have Px{U) = and Pyiy) = 0. Furthermore, bs fixvis ct- finite on S\N, Fubini's 
theorem yields 

= {PxxPY)iS\N)= J p{x)p{y)d{^xv){x,y), 

^'^^ >0 >0 

which implies that {ji x u){S\N) =0 and so Px,y{S \N) = 0. Thus, 

Px,y{S) < Px,y{S \ N) + Px,y{U x Y) + Px,y{X xV)=0. 



=Px(U) 



=Py{V) 



6: Suppose that F : X ^ X' and G : Y — )• Y' are arbitrary measurable mappings. We 
show that Px,Y "C Px x Py implies Pf{x),g{y) ^ Pf{x) x Pg(y)- For any S € X iSiy, 



= (Ppix) X PGiY))iS) = {Px X Py) \ U F-\x)xG-\y) 

\{x,y)eS 



implies 



= Px,Y U F-\x) X G-\y) I = PFix),G{Y){S), 

\ioi:,y)es 

where and denote the preimage sets. 



□ 
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Proof of Theorem\^ (a) (b) Denoting 5 := { (a;, y) G X x Y : p{y | a;) > }, we obtain 



Px,y((Xx Y)\5) 



P{y I x)dv{y) 



which means that S" is a full set w.r.t. Px.y- Denoting 



we have 



dPx{x)^Q, 



P{y) j P{y I x)dPx{x), 

iV:={yeY:p(y)=0} = {y£ Y:Px(5,) =0}, 



[Px X Py){S n {X X N)) = / Px{Sy)dPY{y) = 0, 

JN 



and so the assumption Px,y ^ -Px x Py (condition 2) implies Px,y{S H (X x N)) = 0. 

Let S denote the class (a-ideal) of all ^ £ 3^ such that v is (T-finite on V . Then, the 
supremum M :— supy^g Jyp{y)dv{y) is obviously attained for some V € S, and for this V, 
Fubini's theorem yields 



M= / p{y)dv{y) 

IV 



p{y I x)dPx{x) 
P{y I x)dv{y) 



dvijj) 

dPx{x)=PY{V), 



implying that M is finite. As Af < oo is the maximum value of the integral, we must have 
lB\vP(y)'^^^y) = fo'" B eS, and so \ F) \ iV) = for any B G sE Thus, defining 
v'{B) := iy{B ("1 V), we have \ N) = v{B \ N) for any B e S. As ^(y | a;) is i^-integrable 
for Px-a.e. x, its support Sx — {y & y ■ p{y | a;) > } must belong to S for Px-a.e. x. It 
follows 



Px,y{R) - Px,y{{RC^S) \ (X X iV)) 

P(y I x)dv{y) 



< 



< 



P{y I x)dv'{y) 



P{y I x)dv'{y) 
dPxix) 



dPxix) 
dPx{x) 



P{y I x)dv{y) 



dPx{x)=Px,Y{R) 



for aW R & X ®y and so ^(j/ | a;) is a conditional density w.r.t. v' , too. Furthermore, as v' is 
(T-finite, Fubini's theorem yields 



p{y)dv'{y) 



p{y I x)dPx{x) 
p{y 1 x)dv'{y) 



B 



dv'{y) 
dPx{x)^PY{B) 



^This implies that v{B) can only attain the values and oo for any measurable B C Y \ (V U A^). Hence, 
if V is semifinite, only the value will be possible for these sets, and it follows that v and u' must agree on 
the support of p(j;). This proves the statement of footnote [T] on p.|4] 
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for a\\ B y and so p{y) is a density of Y w.r.t. i^'. 

(b) =>- (c) Assuming that p{y) = J p{y \ x)dPx{x) is a density of Y w.r.t. a cr-finite 
measure v' , let us show that the the function Px\y defined by Bayes' formula is a well-defined 
conditional distribution. Using the definitions and Fubini's theorem, we obtain 



/s^p(y I x)dPx{x) 
Jp{y\x)dPxix) 



dP{y) = 



pjy I x) 

piv) 
pjy I x) 

piv) 



dPx{x) 
dPviy) 



dPriy) 
dPx{x) 



pjy I x] 
p{y) 



p{y)dv'{y) 



dPxix) 



p{y 1 x)dv'{y) 



dPx{x) ^ Px,y{S). 



(c') <^ (c) obvious. 

(c) => (a) As Px\y is given as an integral over Px, condition 4 follows. 



□ 



2.4 Generalization 

For completeness, we present a generalization of Theorem [T] to more than two random vari- 
ables. To state the generalization, we need another definition. 

Definition 1. A Bayes network is a directed acyclic graph representing a dependency struc- 
ture of a set Xi, . . . , Xn of random variables. Each random variable Xk is represented by a 
node whose parents are its conditioning variables X^j^. ^j, . . . , AT^i-j, where we can assume 
WLOG that j{k, i) < fc for alH = f , . . . , (topological sorting), so that the joint distribution 
of , . . . , Xn is given by the product 

Pxi....,X„ — JJ^Xfc|Xj(fc,i),...,Xj(fc_„^)j 
k 

where one can interpret, e.g., Pxk\x^^k,l),■■■,x^^k,r.^) = Pxk\Xu-,Xt,_i and then apply the tran- 
sition measure product operator. 

Theorem 3. Let Xi, . . . ,X„ be random variables. Then, the following are equivalent: 
1. Xi, . . . , Xn have a joint density, 

2- Pxu-.X,, <-Px, X ••• X Px„; 

3. Pxi x„ is representable as a Bayes network where each conditional distribution Px^lxj^t 

has a density w.r.t. a a -finite measure fik, 

4. Pxi,...,Xn is representable as a Bayes network where each conditional distribution Pxk\xj(j, 
is absolutely continuous w.r.t. Px^- 

5. Pxi....,Xn is representable as a Bayes network where each conditional distribution Pxk\xj^u 
is dominated by a measure fik w.r.t. which there exists a marginal density p{xk)- 

Furthermore, 

6. if the above conditions hold for Xi, Xn, then they also hold for X'j, — Fk{Xk), where 
Fk : — >■ X'j, are any measurable functions. 

Proof. This proof is a straightforward generalization of the proof of Theorem [TJ 
For brevity, we shall denote the parents of Xk by x^k ■— ixj(k.i), ■ ■ ■ , Xj(k,nk))- 



5: The joint density p := dPxi,...,x„/d{Pxi x • • ■ x Pxn) induces for each k the con- 
ditional density p{xk \ xi,. . . ,Xk-i) w.r.t. the marginal distribution Pxk- Thus, the 
required Bayes network is given by x<k '■= {xi,. . ■ , Xk-i) for all A; = 1, . . . , n. 

4: Let fc £ {1, . . . , n} be arbitrary. Denoting N := {xk & y^k '■ p{xk) = }, we have by 
the definition of conditional distribution 

= / p{xk)diJL{xk) = PxdN) = [ Px,\.,,{N)dPx,Ax<k), 

which implies Px^\x^kW = foi' Px^k'^-e- x<k- However, as Px^lx^^ is only deter- 
mined for Px^i^-a.c. .x<fc, we are free to modify it so that Px^ix^^i-^) ~ ^ ^<k- 
Wc will show that this Px^lx^^ dominated by Pxk ^oi all x<k- Let S € Xk he such 
that Px^ (S) = 0. Then, we have 



= Px,{S\N)= [ p{xk)d,i{xk) 



4 
3 



>o 

which implies Hk{S \N) ^Q. As Px^W^^ < M/o wc have Px^\x^k{^ \N) = 0, but as 
also Pxk\x<:kW = 0, we obtain Pxk\x<ki^) = 0- Thus, Pxk\x<k ^ for all x<k- 

3: Choose fik = Px^ ■ 

1: By the definition of the conditional densities and Fubini's theorem, we have 



Pxt,...,xSS) = / ]^p(a-'fc I x<k)dfik{xk) 

^ S 7, 



J|p(Xfe I X<k) 



d{lli X • • ■ X lln){x). 



Thus, rife^'C^fc I ^<k) is a joint density of Xi, . . . , X„ w.r.t. /ii x • ■ • x fik- 

2: Suppose that . . . iXn) is a joint density w.r.t. /ii x • ■ • x /i„ and let S & Xi® 
• • • (g) be an arbitrary measurable set such that {Px^ x • • • x Px„){S) = 0. We will 
show that then Pxi....,Xr,{S) = 0. Denoting 

Nk {a;fc e Xfe : p(xfe) 0}, 

N := y Xi X • • • X Xfe_i X NkX Xk+i x • • • x X„, 

k 

we have Pxk i^k) = for all fc. Furthermore, as /xi x • ■ • x /i„ is cr-finite on S \ N, 
Fubini's theorem yields 



= (Pxi X • • • X PxJ{S \N)= [ T\p{xk)dfXk{xk) 

JS\N V 

Y[pixk) 



S\N 



'S\N-^ 

X • ■ • X Hk){x), 



>o 



which implies that {ni x ■ ■ ■ x \ A'') = and so Pxi,...,Xn{S \N) = 0. Thus, 

Px,,...,xAS) < Px,,...,x„iS\N) + Y,PxdNk) =0. 
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2 => 6: Suppose that Fk : Xk ^ XJ, are arbitrary measurable mappings. We show that 
Pxi,...,x„ < fxi X • • • X Px„ imphes Pfi(Xi),...,f„(x„) < Pf,{x,) x • • • x /V„(x„)- For 
any 5 e Afi (g) • • ■ (g) 
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